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(57) ABSTRACT 

The present invention relates generally to computer 
software, and more specifically, to a method and system of 
making computer software resistant to tampering and 
reverse -engineering. "Tampering " occurs when an attacker 
makes unauthorized changes to a computer software pro- 
gram such as overcoming password access, copy protection 
or timeout algorithms. Broadly speaking, the method of the 
invention is to increase the tamper-resistance and obscurity 
of computer software code by transforming the data flow of 
the computer software so that the observable operation is 
dissociated from the intent of the original software code. 
This way, the attacker can not understand and decode the 
data flow by observing the execution of the code. A number 
of techniques for performing the invention are given, includ- 
ing encoding software arguments using polynomials, prime 
number residues, converting variables to new sets of bool- 
ean variables, and defining variables on a new 
n-dimensional vector space. 
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TAMPER RESISTANT SOFTWARE 
ENCODING 

The present invention relates generally to computer 
software, and more specifically, to a method and system of 5 
making computer software resistant to tampering and 
reverse-engineering. 

BACKGROUND OF THE INVENTION 

The market for computer software in all of its various lO 
forms is recognized to be very large and is growing every- 
day. In industrialized nations, hardly a business exists thai 
does not rely on computers and software either directly or 
indirectly, in their daily operations. As well, with the expan- 
sion of powerful communication networks such as the 15 
Internet, the ease with which computer software may be 
exchanged, copied and distributed is also growing daily. 

With this growth of computing power and communication 
networks, a user's ability to obtain and run unauthorized or 
unlicensed software is becoming less and less difficult, and 
a practical means of protecting such computer software has 
yet to be devised. 

Computer software is generally written by software devel- 
opers in a high-level language which must be compiled into 
low-level object code in order to execute on a computer or 
other processor. 

High-level computer languages use command wording 
that closely mirrors plain language, so they can be easily 
read by one skilled in the art. Typically, source code files 
have a sufiSx that identifies the corresponding language. For 
example, Java*^" is a currently popular high-level language 
and its source code typically carries a name such as 
"progl.java". High-level structure refers to, for example, the 
class hierarchy of object oriented programs, or the module 
structure in Ada*^" programs. 

Object-code generally refers to machine-executable code, 
which is the output of a software compiler that translates 
source code from human-readable to machine-executable 
code. In the case of Java™, there is one file per class and the ^ 
files have names such as "className. class", where "class- 
Name" is the name of the class. Such files are generally 
called ".class files". 

The low-level structure of object code refers to the actual 
details of how the program works. Low-level analysis usu- 45 
ally focuses on, or at least begins with, one routine at a time. 
This routine may be, for example, a procedure, function or 
method. Analysis of individual routines may be followed by 
analyses of wider scope in some compilation tool sets. 

The low-level structure of a software program is usually 50 
described in terms of its data flow and control flow. Data- 
flow is a description of the variables together with the 
operations performed on them. Control-flow is a description 
of how control jumps from place to place in the program 
during execution, and the tests that are performed to deter- 55 
mine those jumps. 

Tampering refers to changing computer software in a 
manner that is. against the wishes of the original author. 
Traditionally, computer software programs have had limita- 
tions encoded into them, such as requiring password access, 60 
preventing copying, or allowing the software only to execute 
a predetermined number of limes or for a certain duration. 
However, because the user has complete access to the 
software code, methods have been found to identify the code 
administering these limitations. Once this coding has been 65 
identified, the user is able to overcome these programmed 
limitations by modifying the software code. 
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Since a piece of computer software is simply a listing of 
data bits, ultimately, one cannot prevent attackers from 
making copies and making arbitrary changes. As well, there 
is no way to prevent users from monitoring the computer 
software as it executes. This allows the user to obtain the 
complete data-flow and control- flow, so it was traditionally 
thought that the user could identify and undo any protection. 
This theory seemed to be supported in practice. This was the 
essence of the copy-protection against hacking war that was 
common on Apple-II and early PC software, and has resulted 
in these copy-protection efforts being generally abandoned. 

Since then, a number of attempts have been made to 
prevent attacks by "obfuscating" or making the organisation 
of the software code more confusing and hence, more 
difficult to modify. Software is commercially available to 
"obfuscate" source in code in manners such as: 
globally replacing variable names with random character 
strings. For example, each occurrence of the variable 
name "SecurityCode" could be replaced with the char- 
acter string "lxcd385mxc" so that it is more difficult for 
an attacker to identify the variables he is looking for; 
deleting comments and other documentation; and 
removing source-level structural indentations, such as the 
indentation of loop bodies, to make the loops more 
difficult to read. 
While these techniques obscure the source code, they do 
not make any attempts to deter modification. Once the 
attacker has figured out how the code operates, he is free to 
modify it as he choses. 

A more complex approach to obfuscation is presented in 
issued U.S. Pat. No. 5,748,741 which describes a method of 
obfuscating computer software by artificially constructing a 
"complex wall". This "complex wall" is preferably a "cas- 
cade" structure, where each output is dependent on all 
inputs. The original program is protected by merging it with 
this cascade, by intertwining the two. The intention is to 
make it very difficult for the attacker to separate the original 
program from the complex wall again, which is necessary to 
alter the original program. This system suffers from several 
major problems: 
large code expansion, exceeding a hundred fold, required 
to create a sufficiently elaborate complex waU, and to 
accommodate its intertwining with the original code; 
and 

low security since the obfuscated program may be divided 
into manageable blocks which may be de-coded 
individually, allowing the protection to be removed one 
operation at a time. 
Other researchers are beginning to explore the potential 
for obfuscation in ways far more effective than what is 
achieved by current commercial code obfuscators, though 
still inferior to the obfuscation of issued U.S. Pal. No. 
5,748,741. For example, in their paper "Manufacturing 
cheap, resilient, and stealthy opaque constructs". Confer- 
ence on Principles of Programming Languages (POPL), 
1998 [ACM 0-89791-979-3/98/01], pp. 184-196, C. 
Collburg, C. Thomborson, and D. 

Low propose a number of ways of obscuring a computer 
program. In particular, Collburg ct al. disclose obscuring the 
decision process in the program, that is, obscuring those 
computations on which binary or multiway conditional 
branches determine their branch targets. Clearly, there are 
major deficiencies to this approach, including: 

because only control- flow is being addressed, domain 
transforms arc not used and data obfuscation is weak; 
and 
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there is no effort to provide tamper-resistance. In fact, 
CoUburg et al. do not appear to recognize the distinc- 
tion between tamper-resistance and obfuscation, and as 
a resulf, do not provide any tamper-proofing at all. 
The approach of Collburg et al. is based on the premise 5 
that obfuscation can not offer a complete solution to tamper 
protection. Collburg et al. state that: "... code obfuscation 
can never completely protect an application from malicious 
reverse engineering efforts. Given enough time and 
determination, Bob will always be able to dissect Alice's lO 
application to retrieve its important algorithms and data 
structures." 

As noted above, it is desirable to prevent users from 
making small, meaningful changes to computer programs, 
such as overriding copy protection and timeouts in demon- 15 
stration software. It is also necessary to protect computer 
software against reverse engineering which might be used to 
identify valuable intellectual property contained within a 
software algorithm or model. In hardware design, for 
example, vendors of application specific integrated circuit 20 
(ASIC) cell libraries often provide precise software models 
corresponding to the hardware, so that users can perform 
accurate system simulations. Because such a disclosure 
usually provides sufficient detail to reveal the actual cell 
design, it is desirable to protect the content of the software 25 
model. 

In other applications, such as emerging encryption and 
electronic signature technologies, there is a need to hide 
secret keys in software programs and transmissions, so thai 
software programs can sign, encrypt and decrypt transac- 30 
tions and other software modules. At the same time, these 
secret keys must be protected against being leaked. 

There is therefore a need for a method and system of 
making computer software resistant to tampering and 
reverse engineering. This design must be provided with 35 
consideration for the necessary processing power and real 
time delay to execute the protected software code, and the 
memory required to store it. 

SUMMARY OF THE INVENTION 40 

It is therefore an object of the invention to provide a 
method and system of making computer software resistant to 
tampering and reverse engineering which addresses the 
problems outlined above. 

The method and system of the invention recognizes that 
attackers cannot be prevented from making copies and 
making arbitrary changes. However, the most significant 
problem is "useful tampering" which refers to making small 
changes in behaviour. For example, if the trial software was 
designed to stop working after ten invocations, tampering 
that changes the "ten" to "hundred" is a concern, but 
tampering that crashes the program totally is not a priority 
since the attacker gains no benefit. 

Data-flow describes the variables together with operations 55 
performed on them. The invention increases the complexity 
of the data-flow by orders of magnitude, allowing "secrets" 
to be hidden in the program, or the algorithm itself to be 
hidden. "Obscuring" the software coding in the fashion of 
known code obfuscators is not the primary focus of the 50 
invention. Obscurity is necessary, but not sufficient for, 
achieving the prime objective of the invention, which is 
tamper-proofing. 

One aspect of the invention is broadly defined as a method 
of increasing the tamper-resistance and obscurity of com- 65 
puter software code comprising the steps of transforming the 
data flow in the computer software code to dissociate the 
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observable operation of the transformed the computer soft- 
ware code from the intent of the original software code. 

A second aspect of the invention is broadly defined as a 
method of increasing the tamper-resistance and obscurity of 
computer software code comprising the steps of encoding 
the computer software code into a domain which does not 
have a corresponding semantic structure, to increase the 
tamper-resistance and obscurity of the computer software 
code. 

A further aspect of the invention is defined as a computer 
readable memory medium, storing computer software code 
executable to perform the steps of: compiling the computer 
software program from source code into a corresponding set 
of intermediate computer software code; encoding the inter- 
mediate computer software code into tamper-resistant inter- 
mediate computer software code having a domain which 
does not have a corresponding semantic structure, to 
increase the tamper- resistance and obscurity of the computer 
software code; and compiling the tamper-resistant interme- 
diate computer software code into tamper- resistant computer 
software object code. 

An additional aspect of the invention is defined as a 
computer data signal embodied in a carrier wave, the com- 
puter data signal comprising a set of machine executable 
code being executable by a computer to perform the steps of: 
compiling the computer software program from source code 
into a corresponding set of intermediate computer software 
code; encoding the intermediate computer software code 
into tamper-resistant intermediate computer software code 
having a domain which does not have a corresponding 
semantic structure, to increase the tamper-resistance and 
obsciurity of the computer software code; and compiling the 
tamper-resistant intermediate computer software code into 
tamper-resistant computer software object code. 

Another aspect of the invention is defined as an apparatus 
for increasing the tamper-resistance and obscurity of com- 
puter software code, comprising: front end compiler means 
for compiling the computer software program from source 
code into a corresponding set of intermediate computer 
software code; encoding means for encoding the intermedi- 
ate computer software code into tamper-resistant interme- 
diate computer software code having a domain which does 
not have a corresponding semantic structure, to increase the 
tamper-resistance and obscurity of the computer software 
code; and back end compiler means for compiling the 
tamper-resistant intermediate computer software code into 
tamper-resistant computer software object code. 

BRIEF DESCRIPTION OF THE DRAWINGS 

These and other features of the invention will become 
more apparent from the following description in which 
reference is made to the appended drawings in which: 

FIG. 1 presents an exemplary computer system in which 
the invention may be embodied; 

FIG. 2 presents a flow chart of the invention applied to a 
software compiler in an embodiment of the invention; 

FIG. 3 presents a flow chart of a general algorithm for 
implementation of the invention; 

FIG. 4 presents a flow chart of a null coding routine in an 
embodiment of the invention; 

FIG. 5 presents a flow chart of a polynomial encoding 
routine in an embodiment of the invention; 

FIG. 6 presents a flow chart of a residue number encoding 
routine in an embodiment of the invention; 

FIG. 7 presents a flow chart of a bit-explosion coding 
routine in an embodiment of the invention; 
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FIG. 8 presents a flow chart of a custom base coding niog on a computer system 10. Standard compiler tech- 
routine in an embodiment of the invention; and niques are well known in the art. Two standard references 
FIG. 9a and 9b present a flow chart of the preferred which may provide necessary background are "Compilers 
embodiment of the invention. Principles, Techniques, and Tools" 1988 by Alfred Aho, 
DETAILED DESCRIPTION OF PREFERRED ^ ^^^^^ Jeffrey Ulhnan (ISBN 0-201-1008-6), and 
EMBODIMENTS OF THE INVENTION "Advanced Compiler Design & Implementation" 1997 by 
. . r r . Steven Muchnick (ISBN 1-55860-320-4).. The preferred 
Hie mvention lies m a means for recodmg software code embodiment of the invention is described with respect to 
m such a manner that it IS fragile to tampenne. Attempts to ... . , . . u- u • j -u j • xjt u • i 

. t. . -if^u r *. . u static smglc assignment, which IS descnbed m Muchmck. 

modify the software code will therefore cause it to become ^ 

inoperable in terms of its original ftinction. The tamper- F^G. 2 presents an example of such an implementation in 
resistant software may continue to run after tampering, but a preferred embodiment of the invention. Generally, a soft- 
no longer performs sensible computation. ware compiler is divided into three components, described 

The extreme fragility embedded into the program by as the front end, the middle, and the back end. The front end 
means of the invention docs not cause execution to cease ^0 is responsible for language dependent analysis, while the 
immediately, once it is subjected to tampering. It is desirable back end 32 handles the machine-dependent parts of code 
for the program to continue running so that, by the time the generation. Optionally, a middle component may be 
attacker realizes something is wrong, the modifications and included to perform optimizations that are independent of 
events which caused the functionality to become nonsensical language and machine. Typically, each compiler family will 
are far in the past. This makes it very difiScult for the attacker have only one middle, with a front end 30 for each high-level 
to identify and remove the changes that caused the failure to 20 language and a back end 32 for each machine- level Ian- 
occur, guage. All of the components in a compiler family can 

An example of a system upon which the invention may be generally communicate in a common intermediate language 

performed is presented as a block diagram in FIG. 1. This so they are easily interchangeable. 

computer system 10 includes a display 12, keyboard 14, component of the software compiler is a front 

computer 16 and external devices 18, 25 ^nd 30, which receives source code, possibly in a high-level 

The computer 16 may contain one or more proce^ors or ^ generates what is commonly described as 

JS'^'^Po?? ?n°''' f ""'T''^^ processing unit (CPU) 20. .^^^^^^j representation or intermediate code. There are many 

The CPU 20 performs anthmetic calculations and control ^^^^ ^^^^^ ^^^^ 

functions to execute software stored in an intemal memory ,. r j ri_- i_ - 

22, preferably random access memory (RAM) and/or read 30 1° the preferred embodiment of the mvention, this mter- 

only memory (ROM), and possibly additional memory 24. mediate code is then encoded to be tamper-resistant by the 

The additional memory 24 may include, for example, mass middle compiler 34 of the invention to make the desired 

memory storage, hard disk drives, floppy disk drives, mag- areas of the input software tamper-resistant. The operation 

netic tape drives, compact disk drives, program cartridges of the invention in this manner will be described in greater 

and cartridge interfaces such as those found in video game detail hereinafter. 

devices, removable memory chips such as EPROM or Finally, the compiler back end 32 receives the tamper- 

PROM, or similar storage media as known in the art. This resistant intermediate code and generates object code. The 

additional memory 24 may be physically internal to the tamper-resistant object code is then available to the user to 

computer 16, or external as shown in FIG. 1: j^^^^ ^^^^^^^ creating an executable image of the 

The computer system 10 may also include other similar gQ^j^e code for execution on a computer system 10. 

means for allowing computer programs or other instructions ^ of compiler front ends30 and back ends 32 is well 

to be loaded. Such means can include, tor example, a . -.u_*>r n*i. 1 

communications interface 26 which allows software and ^nown m the art. Typica ly these compiler comp are 

data to be transferred between the computer system 10 and commercially available "off the shelf , although this is not 

external systems. Examples of communications interface 26 V^' ^ase for Java™, and are suited to particular computer 

can include a modem, a network interface such as an as software and computers. For example, if a Compiler Wnter 

Ethernet card, a serial or parallel communications port. wishes to compile a C++ program to operate on a 486 

Software and data transferred via communications interface microprocessor, he would pair a front end 30 which com- 

26 are in the fom of signals which can be electronic, piles high level C++ into mtermediate code, with a back end 

elecuomagnetic, optical or other signals capable of being 32 which compiles this mtermediate code into object code 

received by communications interface 26. so executable on the 486 microprocessor. 

Input and output to and from the computer 16 is admin- '» the preferred embodiment of the invention, the tamper- 
istered by the input/output (I/O) interface 28. This I/O resistant encodmg compiler 34 is implemented with a front- 
interface 28 administers control of the display 12, keyboard end 30 that reads in Java™ class files and a back end 32 that 
14, external devices 18 and other such components of the ^ntes out Java™ class files. However, the invention can 
computer system 10 55 easily be implemented using front ends 30 for different 

The invention is described in these terms for convenience languages and machine binaries, and with back ends 32 for 

puiposes only. It would be clear to one skilled in the art that different machines or even d6;<ompilers for vanous source 

the invention may be apphed to other computer or control languages. For example, it is hkely that an embodiment will 

systems 10. Such systems would inchide all manner of .""""gh' ^ ■n^'^'} '° «'°>P^1« ^ '"""P"- 

appliances having computer or processor control including ^ l^^^^^""^. ^ ^"^^ °"f "l^" mix-and-match 

telephones, ceUular telephones, televisions, television set '"^ing Java™ class files and outputtmg C source, for 

top units, lap top computers, personal digital assistants and example. 

automobiles preferred embodiment of the invention, a standard 

compiler front end 30 is used to generate intermediate code 

Compiler Technology static single assignment form which represents the seman- 

In the preferred embodiment, the invention is imple- . tics of the program, however any similar semantic repre- 

mented in terms of an intermediate compiler program run- sentation may be used. To better understand the invention, it 
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is useful to describe some additional termiaology relating to 
static single assignment. 

Static Single Assignment and Other Flow-Exposed 
Forms 

A middle compiler intended to perform optimization or 
other significant changes to the way computation is 
performed, typically uses a form which exposes both 
control — and data-flow so that they are easily manipulated. 
Such an intermediate form may be referred to as flow- 
exposed form. 

In particular. Static Single Assignment (SSA) form is a 
well-known, popular and efficient flow-exposed form used 
by software compilers as a code representation for perform- 
ing analyses and optimizations involving scalar variables. 
Effective algorithms based on Static Single Assignment have 
been developed to address constant propagation, redundant 
computation detection, dead code elimination, induction 
variable elimination, and other requirements. 

Static single assignment is a fairly recent way of repre- 
senting semantics that makes it easy to perform changes on 
the program. Converting to and from static single assign- 
ment is well understood and covered in standard texts like 
Muchnick. Many optimizations can be performed in static 
single assignment and can be simpler than the traditional 
non-static single assignment formulations. 

Basically, in static single assignment form, each variable 
is cloned a number of times, once for each assignment to that 
variable. This has the advantageous property that each 
Variable Register (VR) has exactly one place that assigns to 
it and the operations which consume the value from this 
particular assignment are exactly known. Each definition of 
a variable is given a unique version, and different versions 
of the same variable can be regarded as different program 
variables. Each use of a variable version can orJy refer to a 
single reaching definition. This yields an intermediate rep- 
resentation in which expressions are represented in directed 
acyclic graph (DAG) form, that is, in tree form, if there are 
no common subexpressions, and the expression DAGs are 
associated with statements that use their computed results. 

One important property in static single assignment form is 
that each definition dominates all of its uses in the control 
flow graph of the program, unless the use is a (fj-assignment. 
A more detailed description of <t)-assignments is given here- 
inafter. 

Another important property in static single assignment 
form is that identical versions of the same variable have the 
same value on any execution path starting with the initial 
assignment and not looping back to this assignment. Of 
course, assignments in loops may assign different values on 
different iterations, but the property just given still holds. 

When several definitions of a variable reach a merging 
node in the control flow graph of the program, a merge 
function assignment statement called a phi, or <{), assignment, 
is inserted to merge them into the definition of a new 
variable version. This merging is required to maintain the 
semantics of single reaching definitions. Merge nodes are 
covered in the standard text books such as Muchnick and the 
present invention does not require them to be handled any 
differently. 

Of course, the method of the invention could be applied 
to flow-exposed forms other than SSA, where these provide 
similar levels of semantic information, as in that provided in 
Gnu CC. Gnu CC software is currently available at no cost 
from the Free Software Foundation, 

Similarly, the method of the invention could be applied to 
software in its high level or low level forms, if such forms 
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were augmented with the requisite control — and data- flow 
information. This flexibility wifl become clear from the 
description of the encoding techniques described hereinafter. 

Preferably, the method of the invention is implemented in 
the form of a conventional compiler computer program 
operating on a computer system 10 or similar data processor. 
As shown in FIG. 3, the compiler program reads an input 
source program, such as a program written in the C++ 
programming language, stored in the memory 22 or mass 
storage 24, and creates a static single assignment interme- 
diate representation of the source code in the memory 22 
using the compiler front end 30. A simple example of this 
compiling into intermediate code follows. 

Code Block lA shows a simple loop in the FORTRAN 
language, which could form a part of the source program 
input to the compiler front end 30. Code Block IB is a static 
single assignment intermediate representation of code block 
lA output from the compiler front end 30. In static single 
assignment, each virtual register appears in the program 
exactly once on the left-hand side of an assignment. The 
label t is used herein to intentionally correspond to the 
virtual register names of Code Blocks below. 



Code Block lA 


Code Block IB 


(FORTRAN Loop) 


(Static Single Assignment IR) 




% too = 0, tOl = 1 




% t02 - 5, t03 - 50 


K = 0 


sO 


t04 = copy(tOO) 


J = 1 


S3 


t05 = copy(tOl) 


DO 10 I = 1, 50 


s2 


t06 = copy(tOl) 




slO 


tlO = <t) (t04, tl4) 




sll 


til - <^ (t05, tl3) 




5l2 


m = <p (t06, tl5) 


L = J 


s3 


t07 = copy(ta]) 


J = J + K 


s4 


tl3= iadd(tll,llO) 


K= L 


s5 


tl4 = copy(t07) 


10 CONTINUE 


s6 


tl5 = iadd(tl2,t01) 




s7 


t08 = ilc(Ll5.tG3) 




s8 


brt(t0S,sl0) 


K- J + 5 


s9 


t9 - Ladd(t]3,t02) 



65 



Except for the initialization steps in the first five lines, 
each line of Code Block IB corresponds to a line of source 
code in Code Block lA. The sources and destinations for all 
the operations are virtual registers stored in the memory and 
labelled tl to tlO. The "iadd" instructions of the above Code 
Blocks represent CPU integer add operations, the "ile" 
instruction is an integer less-than-or-equal-to comparison, 
the "brt" instruction is a "branch if true" operation. Merge 
nodes are represented by the (j) function in the intermediate 
code statements slO, sll and sl2. The loop of Code Block 
lA requires that the backward branch at s8 use the statement 
number slO to reference the head of the loop. 

Use of Optimizers 

Since the invention alters the organization of the software 
program beyond understanding, a lot of optimization tech- 
niques will become ineffective. Therefore, any desired opti- 
mization should be done before the tamper-resistant com- 
piling 36 in FIG. 3. Performing optimization after would 
require the tamper-resistant compiling routine to leave spe- 
cial coding to ensure that the optimization routine does not 
alter or remove essential coding. This would require a lot of 
additional code, and would be error-prone. This would also 
require a new optimization algorithm, as current algorithms 
do not take account of the special coding. Also, existing 
analysis techniques such as Data -Flow-Analysis and Alias 
Analysis may be used to guide the choice of coding scheme 
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by replacing 'worst-case' data-flow connectivity with con- 
nectivity closer to reality, so that recoding^ to achieve 
matching codings are employed only where really needed. 
For example. Range Analysis done as part of Data-Flow 
Analysis can be used to determine how large the bases used 5 
in the Residue Number Coding need to be. 

General Implementation of Tamper-Resistant 
Compiling 

The lamper-resistant encoding compiler 34 of the inven- lO 
tion receives and analyses the internal representation of 
Code Block IB. 'Based on its analysis, the tamper-resistant 
encoding compiler 34 restructures portions of the interme- 
diate representation, thereby making it fragile to tampering. 

In general, the tamper-resistant encoding compiler 34 
performs three passes of the intermediate code graph for 
each phase of encoding, shown in FIG. 3 as steps 38 through 
46. In the preferred embodiment of the invention, the 
popular practice of dividing the compiler into a number of 
phases, several dozen, in fact, is being followed. Each phase 
reads the static single assignment graph and does only a little 
bit of the encoding, leaving a slightly updated static single 
assignment graph. This makes it easier to understand and to 
debug. A "phase control file" may be used to specify the 
ordering of the phases at step 38 and particular parameters ^ 
of each phase, for added flexibility in.ordering phases. This 
is particularly useful when one phase is to be tested by 
inserting auditing phases before and/or after it, or when 
debugging options are added to various phases to aid debug- 
ging. ^ 

Whenever variable codings are chosen, three passes of the 
intermediate code graph arc generally required. In a first 
pass, at step 40, the tamper- resistant encoding compiler 34 
walks the SSA graph and develops a proposed system of 
re-codings. If the proposed codings are determined to be 
acceptable at step 42, which may require a second pass of the 
SSA graph, control proceeds to step 44, where the accept- 
able re-codings are then made in a third pass. If the proposed 
coding is found to contain mismatches at step 42, then 
recodings are inserted as needed to eliminate the mismatches 
at step 46. 

Once all of the encoding phases have been executed, the 
resulting tamper- resistant intermediate code is then com- 
piled into object code for storage or machine execution by 
the compiler back end 32. 

This hardening of software has traditionally been thought 
to be impossible. The usual reasoning is that the attacker can 
"watch" the program execute, thereby obtaining the com- 
plete data-flow and control-flow, so the attacker can undo 50 
any protection. 

Existing obfuscation techniques do not offer effective 
protection because they do not hide how the program 
actually runs. Therefore, existing decompiling tools may 
observe the software execution and point to the code that the ss 
attacker wishes to modify. The invention, however, 
decouples or dissociates the actual, observable operation 
from the corresponding software code so that the attacker 
may not find the corresponding code. This is done by 
transfonming the domain of the data flow into a new domain eo 
which does not have a corresponding high level semantic 
structure. This new method makes it very difficult to fix any 
reference points for variables in the tamper-resistant pro- 
gram as everything has multiple interpretations. 

Because of this dissociation, the invention may be applied 65 
to small areas of the input software code. In a typical 
application, much of the executable code need not be made 
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tamper- resist ant since there is no need for it to be secure 
from tampering. For example, encoding software which 
creates a bit-mapped graphical user interface (GUI) would 
be pointless as the information it conveys is immediately 
evident to the user. 

Obfuscation relies solely on "hiding'* the organization of 
the computer software for protection. Existing obfuscators 
are weak, so a larger portion of the source code must be 
obfuscated to ensure that some degree of obscurity is 
achieved in the area of the program requiring protection. The 
invention, in contrast, provides strong obfuscation, and 
resists tampering both by obscurity and by extreme induced 
fragihty. Therefore, the invention need only encode the area 
of the program requiring protection. 

This allows the invention to be far more efiScient in terms 
of memory, processing power and execution time. For 
example, if the source code requires 1 megabyte of memory, 
but all of the security measures reside in a 5 kilobyte block, 
tamper-resistant encoding that 5 kilobyte block by an order 
of 20 limes, in the manner of the invention, will only 
increase the overall size of the input software program by 
10%, from 1 megabyte to 1.1 megabytes. In contrast, if it 
were necessary to apply the process of the invention to all of 
the source code, a program size increase of 2000%, to 20 
megabytes, would result. 

The method and system of the invention recognizes that 
one cannot prevent attackers from making copies and mak- 
ing arbitrary changes. However, the most significant prob- 
lem is "useful tampering" which refers to making small 
changes in behaviour. For example, if the trial software was 
designed to slop working after ten invocations, tampering 
that changes the "ten" to "hundred" is a concern, but 
tampering that crashes the program totally is not important. 

In operation, the tamper-resistant encoding technique of 
the invention will work much like a compiler from the user's 
point of view, although the internal operations are very 
different, users may start with a piece of software that is 
already debugged and tested, run that software through the 
invention software and end up with new tamper-resistant 
software. The new tamper-resistant software still appears to 
operate in the same manner as the original software but it is 
now hardened against tampering. 

Wide Applications 

Tamper-resistant encoding in a manner of the invention 
has very wide possible uses: 

1. Protecting the innovation of a software algorithm. For 
example, if one wished to sell software containing a 
new and faster algorithm to solve the linear program- 
ming problem, one would like to sell the software 
without disclosing the method. 

2. Protecting the innovation of a software model. In 
hardware design, it is common for vendors of ASIC cell 
libraries to provide precise software models so that 
users can perform accurate system simulations. 
However, it would be desirable to do so without giving 
away the actual cell design. 

3. Wrapping behaviour together. Often, it is desirable to 
write some software that will perform a function "A" if 
and only if an event "B" occurs. For example, a certain 
function is performed only if payment is made. 

4. Hiding secrets, such as adding encryption keys or 
electronic signatures into a program, so that the pro- 
gram can sign things and encrypt/decrypt things, with- 
out leaking the key. 



12/10/2003, EAST Version: 1.4.1 



us 6,594,761 Bl 



11 



12 



Clearly, there are other applications and combinations of 
applications. For example, an electronic key could be 
included in a decoder program and the decoding tied to 
electronic payment, thereby providing an electronic com- 
merce solution. 

Properties of Tamper-Resistance 

The general approach is that each variable in the software 
program being encoded, is mapped to some new set of 
variables, which is cleverly chosen to be not easily revers- 
ible to the original. Then, all the arithmetic is performed in 
the domain of the new set of variables when the program 
executes. 

A number of different techniques are presented herein for 
effecting this tamper- resist ant encoding, which are described 
as null, polynomial, residual number, bil-exploded, bit- 
tabulated and custom base coding. These techniques may be 
applied using a large number of possible codings, as well, 
these and other coding techniques can be combined. For 
example, after using the residue number technique, each of 
the resulting components can be further encoded using the 
polynomial technique. 

These techniques are presented as examples of how the 
invention may be embodied, and one skilled in the art would 
be able to identify other similar techniques for effecting the 
invention. These techniques may be described in terms of 
the following properties: 
1. Anti-hologram 
This property is the opposite of thai demonstrated by a 
hologram. If a piece of a hologram is removed, the 
whole image of the hologram will stiU be visible, but 
in reduced resolution. In contrast, the invention 
disperses the definition of a single variable into 
several locations so that a single modification or 
deletion in any one of those locations wiU corrupt its 
value. This property is desirable as it magnifies the 
detrimental effects of any tampering. Of the tech- 
niques described herein, Residue Number, Bit- 
Explosion and Custom Base have this property. 
For example, the assignment: 

may be encoded with three different equations stored 
in different areas of the program as the following 
assignments: 

b:-y~z 



In this simple example, the value of variable x will be 

modified if any of three assignments is modified. 
Generally, the sequence in which the assignments are 
made is significant and is taken into consideration 
while variables and assignments in the input soft- 
ware program are being encoded. 
. Fake-robustness 

This properly describes tamper-resistant encoding 
which allows a given value to be changed and have 
the program still run, but because only a limited set 
of values are actually sensible, certain values will 
eventually lead to nonsensical computation. It is 
desirable to maximize the number of encodings that 
are fake -robust so that a tampered program will not 
immediately crash. This makes debugging more dif- 
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ficult as the attacker will have to analyse a larger 
region of code for any change. All of the coding 
techniques described above, except for the null 
technique, demonstrate this property. 

In the context of tamper-resistance, true robustness 
would allow the code to be modified somewhat with 
no change in semantics. Fake robustness does not 
preserve the semantics of the original software pro- 
gram when modified, so the modified software pro- 
gram continues to execute, but eventually will crash. 

For example, if an array A is known to have 100 
elements, then converting the expression A [i] to the 
expression A [i mod 100] makes it fake-robust in that 
variable i may take on any value and not cause an 
array bounds error. However, certain values of vari- 
able i may cause nonsensical operation elsewhere in 
the program without causing a complete failure. 
. Togetherness properly 

In terms of encoded data flow, this property describes 
a scenario in which the definition of several 
variables, preferably from different areas of the 
program, are arithmetically lied together. Therefore, 
an attacker cannot alter the encoded program by 
changing a single value in a single place. This 
increases the likelihood of any tampering causing a 
crash in a different area of the program. The degree 
of togetherness of the coding techniques presented 
herein, is low with the Polynomial transform tech- 
nique which has a *'l to 1" correspondence, moderate 
with Residue Numbers which has "1 to many" 
correspondence, and high with Custom Base which 
has a "many to many" correspondence. Note that the 
degree of togetherness for the polynomial encoding 
can be increased by splitting up equations as previ- 
ously shown. 

For example, original variables x, y and z, may be 
encoded into t, u, and v using appropriate functions: 

f-Fi(jc y, i) 
u=F2(x, y, i) 

y> 

which may be decoded to yield the values of vari- 
ables x, y and z, using a complementary set of 
appropriate functions: 



x-G^{t, It, v) 
y=G2{t, u, v) 
z-G^t, «, V) 

When described in terms of these three properties, the 
custom base technique appears to offer the most tamper- 
resLstant encoding. However, consideration should be made 

55 for the resulting impact on run-time expansion, code space 
expansion, complexity of implementation, probability of 
requiring recoding and other metrics. Recoding refers to the 
addition of RECODE operations where mismatches would 
otherwise occur between proposed encodings. 

60 Each coding technique will have different time/space/ 
complexity trade-offs for different operations. For example, 
residue number coding can handle large numbers for 
addition, subtraction and multiplication, but can only handle 
very restricted forms of division. Most texts state that 

65 residue number division is impossible, but the invention 
applies a method of division where the divisor is part of the 
residue base. 
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Several techniques for realizing the invention will now be program is replaced with the new equation (5). Note that, in 
described. this case, the constant c is chosen to be equal to -a, which 

hides the subtraction operation from equation (1) by replac- 
NuU Coding [i an addition operation: 



A null coding is one which does not affect the original 
software program, that is, the original variable is represented 



z'.-'X'+y' (5) 



by the same value. There are many places in a program Tj,e change in the operation can be identified by algebraic 

where encoding is not particularly advantageous, for substitution- 
example at the input and output points of the program. As the 

inputs and outputs may be monitored from a known position z':^(x-y)+{b+d) (6) 
outside the program, they are easily identified by an attacker. 

Rather than addressing the complexity of encoding the Equation (5) is the equation that will replace equation (1) 

inputs and outputs, with little return for the effort, it is more in the software program, but the new equations (2), (3) and 

convenient to use a null coding. (4) will also have to be propagated throughout the software 

Null coding may be realized by adding a routine as shown program. If any conflicts arise due to mismatches, RECODE 

in FIG. 4, as one of the phases executed at step 38 of FIG. operations will have to be inserted to eliminate them. 

3. As the SSAgraph is traversed, one analyses each variable, 1° generating the tamper-resistant software, the transfor- 

and at step 48, determines whether an identified variables is mations of each variable are recorded so that all the neces- 

one which is not to be hidden. As noted above, this may relationships can be coordinated m the program as the 

include a variable which is an input, output or otherwise SSAgraph is traversed. However, once all nodes of the SSA 

pointless to hide. If so, then a null coding is performed at graph have been transformed and the "decoding" lines of 

gjgp 5Q code added at the end, the transformation data may be 

Also] as noted with regard to HG. 3, this null coding is ^^'f^' ^■'^'"'^f 8 eq^aUons (3), (4) and (5). TTiat is the 
recorded in the "phase control file". Therefore, if it is „ Phantom paralle program ,s discarded, so there is no data 
, , • J ^ . >«o * Ti J- ' * • J *L ^ left which an attacker may use to reverse engineer the 

determined at step 48 that null coding is not required, the . . , . ^ 

"phase control file'' will be made aware at step 52, that S . ^ . i.i u i. r j u j • 

^ ^ r J- i_ -r J f ' Noit that a subtraction has been performed by doing an 

another form of coding may be performed. -.i . i • * • ^ j 

^ ^ addition without leaving a negative operator in the encoded 

By use of the decision block at step 54 and stepping program. The encoded program only has a subtraction 

through the lines of SSA code at step 56, the balance of the 30 operation because the phantom program knows "c =-a». If 

SSA graph is traversed. ^jj^ value of the constant had been assigned as =a", then 

PI n ■ 1 Cnd'np the encoded equation would really be an addition. Also, note 

that each of the three variables used a different coding and 
The polynomial encoding technique takes an existing set there was no explicit conversion into or out of any encoding, 
of equations and produces an entirely new set of equations 35 p^f the case of: 
with different variables. The variables in the original pro- 
gram are usually chosen to have meaning in the real world, y '~-x CO 
while the new encoded variables will have no such meaning. 
As well, the clever selection of constants and polynomials chose: 
used to define the new set of equations may allow the 40 
original mathematical operations to be hidden. 

This technique represents a variable x by some polyno- y':«(-o)>+fc (9) 
mial of X, such as ax+b where a and b are some random 

numbers. This technique allows us to hide operations ^^ich would cause the negation operation to vanish, and x 

by changing their sense, or to distribute the definition y to appear to be the same variable. The difference is 

of a variable around in a program. only tracked in the interpretation, 

A convenient way to describe the execution of the poly- Similarly, for the case of: 

nomial routine is in terms of a "phantom parallel program". 

As the polynomial encoding routine executes and encodes ^* 

the original software program, there is a conceptual program one could chose: 

running in parallel, which keeps track of the encodings and 

their interpretations. After the original software program has y'.=ajc+Ci>f5) (li) 
been encoded, this ''phantom parallel program" adds lines of 

code which "decode" the output back to the original domain. ^^^^ing the addition operation to vanish. Again, now there 

For example, if the SSAgraph defines the addition of two ^5 are two different interpretations of the same value, 

variables as- ^ presents a simple implementation of the polyno- 
mial coding technique. At step 58, a line of the SSA graph 

z:~x-y (1) is analysed to determine whether it defines a polynomial 

equation suitable for polynomial encoding. If so, a suitable 

this equation may be hidden by defining new variables: gQ polynomial equaUons is defined at step 60 that 

accomplishes the desired encoding. 
As noted above, this technique is generally applied to 

>''.=cy+rf (3) physically distribute the definition of a variable throughout 

a program so a single assignment is usually replaced by a 

65 system of assignments distributed throughout the program. 

Next, a set of random values for constants a, b, c, d, e, and For the simple polynomial scheme, the values of con- 

f is chosen, and the original equation (1) in the software slants are generally unrestricted and the only concern is for 



x'.-^ax-i-b, and (8) 



(2) 
(3) 

y.-=«4/ (4) 
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the size of the numbers. Values are chosen which do not 
cause the coded program to overflow. In such a case, the 
values of constants in these equations may be selected 
randomly at step 62, within the allowable constraints of the 
program. However, as noted above, judicious selection of 5 
values for constants may be performed to accomplish certain 
tasks, such as inverting arithmetic operations. 

At the decision block of step 64 it is then determined 
whether the entire SSA graph has been traversed, and if not, 
the compiler steps incrementally to the next line of code by 
means of step 66. Otherwise, the phase is complete. 

Variations on this technique would be clear to one skilled 
in the art. For example, higher order polynomials could be 
used, or particular transforms developed to perform the 
desired hiding or inversion of certain ftinclions. 
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0 - 


(0, 0, 0), 


1 = 


(1, 1, 1). 


5 - 


(2, 0, 5), 


100 - 


(1, 0, 2), and 


105 - 


(0, 0, 0). 



Note that this particular base {3, 5, 7} has a period of 105, 
which is equal to the product of 3x5x7, so that only integers 
inside this range may be represented. The starting point of 
the range may be chosen to be any value. The most useful 
choices in this particular example would be [0, 104] or [-52, 
521]. 

If two integers arc represented in the same base, simple 
arithmetic operations may be performed very easily. 
Addition, subtraction and multiplication for example, may 
be performed component wise in modular arithmetic. Again, 
using the base of {3, 5, 7}: 



1 - (1, 1, 1) and 
5 - (2, 0. 5), then 
1 + 5 = ((1+2) mod 3, (1 + 0) mod 5, (1 + 5) mod 7) 
- (0, 1, 6). 



Of course, 1+5=6, and 6 in residue form with the same 
base is (0, 1, 6). Subtraction and multiplication are per- 
formed in a corresponding manner. 



16 



Heretofore, division had been thought to be impossible, 
but can be done advantageously in a manner of the inven- 
tion. First, however, it is of assistance to review the method 
of solving for the residue numbers. 

Converting from an integer to a corresponding Residue 
Number is simply a matter of dividing by each number in the 
base set to determine the remainders. However, converting 
from a Residue Number back to the original integer is more 
difi5cult. The solution as presented by Knuth is as follows. 
Knuth also discusses and derives the general solution, which 
will not be presented here: 

For an integer "a" which may be represented by a vector 
of residue numbers {^1,^2* . . . a^^): 



where: 



Residue Number Coding 

This technique makes use of the "Chinese Remainder 
Theorem" and is usually referred to as "Residue Numbers" 
in text books (see "The Art of Computer Programming", 
volume 2: "Seminumerical Algorithms", 1997, by Donald E. 20 and 
Knuth, ISBN 0-201-89684-2, pp. 284-294, or see "Intro- 
duction to Algorithms", 1990, by Thomas H. Cormen, 
Charles E. Leiserson, and Ronald L. Rivesl, ISBN 0-262- 
03141-8, pp. 823-826). A "base" is chosen, consisting of a 
vector of pairwise relatively prime numbers, for example; 3, 25 
5 and 7. Then, each variable x is represented as a vector of 
remainders when this variable is operated upon by the 
"base", that is, x maps on to (x rem 3, x rem 5, x rem 7). 

In this scheme, a "Modular Base" consists of several 
numbers that are pairwise relatively prime. Two distinct 30 
integers are said to be relatively prime if their only common 
divisor is 1, A set of integers are said to be pairwise 
relatively prime, if for each possible distinct pair of integers 
from the set, the two integers of the pair are relatively prime. 

An example of such a set would be {3, 5, 7}. In this base, 35 
integers can be represented as a vector of remainders by 
dividing by the base. For example: 



(12) 



and: 



and: 



a,»a(mod for i=l, 2, 



Ci'miimr'^ mod «,) for (=1, 2, 



m,~n!ni for /-I, 2, 



(13) 



. k (14) 

and where the notation "(x"* mod y)" used above denotes 
that integer z such that xz (mod y)=l . For example, (3"^ mod 
7)=5 because 15 (mod 7)=1, where 15=3x5. 

In the case of this example, with a base (3, 5, 7), a vector 
of solution constants, (c3=70, c5=21, c7=15), are calculated. 
Once these constants have been calculated, converting a 
residue number (1,1,1) back to the original integer is simply 
a matter of calculating: 



40 



nci ^r2C2 +0C3 = 1 x70 + 1 x21 -t- J x 15 



= 106 



(15) 



assuming a range of [0,104], multiples of 105 are subtracted 
yielding an integer value of 1 . 

45 Most texts Uke Knuth discuss Residue Numbers in the 
context of hardware implementation or high-precision inte- 
ger arithmetic, so their focus is on how to pick a convenient 
base and how to convert into and out of that base. However, 
in applying this technique to the invention, the concern is on 

50 how to easily create many diverse .bases. 

In choosing a basis for Residue Numbers, quite a few 
magic coefficients may be generated dependent on the bases. 
By observation of the algebra, it is desirable to have different 
bases with a large number of common factors. This can be 

55 easily achieved by having a list of numbers which are 
pairwise relatively prime, and each base just partitions these 
numbers into the components. For example, consider the set 
{16, 9, 5, 7, 11, 13, 17, 19, 23}, comprising nine small 
positive integers which are either prime numbers or powers 

60 of prime numbers. One can obtain bases for residual encod- 
ing by taking any three distinct elements of this set. This 
keeps the numbers roughly the same size and allows a total 
range of 5,354,228,880 which is sufficient for 32 bits. For 
example, one such base generated in this manner might be 

65 {16*9*11, 5*13*23, 7*17*19} ={1584, 1495, 2261}. 

The invention allows a system of many bases with hidden 
conversion between those bases. As well, it allows the 
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solution constants to be exposed without exposing the bases FIG. 6 presents a flow chart of a simple implementation 

themselves. The original bases used to convert the software of a Residue Number encoding phase, in a preferred embodi- 

to residue numbers arc not required to nin the software, but ment of the invention. The routine begins at step 68 by 

would be required to decode the software back to the establishing a base set of pairwise relative primes, for 

original high level source code. The invention allows a set 5 example, the set of {16, 9, 5, 7, 11, 13, 17, 19, 23} as 

of solution constants to be created which may run the presented above. At step 70, a base is computed from this set 

software, without'exposing the original bases. Therefore, the as previously described, such as {1584, 1495, 2261}. A 

solution constants arc of no assistance to the attacker in suitable block of software code is selected from the SSA 

decoding the original software, or reverse engineering it. graph and is transformed into residual form at step 72. If 

To hide the conversion of a residue number, r, defined by 10 operators are found which are not calculable in the residue 

a vector of remainders (r^, rj, . . . rj derived using a base domain, then they will be identified in the phase control file, 

of pairwise relatively prime numbers (b J, b2,b„), a vector of and those operators and their associated variables will be 

solution constants arc derived as follows. Firstly, using the encoded using a diflFcrent technique. At step 74, a corrc- 

method of Knuth, a vector of constants (c^, C2, . . . c^^) may spending set of solution constants is then calculated and is 

be determined which provides the original integer by the 15 stored with the tamper- resistant program. As noted above, 

calculation: these solution constants are needed to execute the program, 

but do not provide the attacker with information needed to 

r^iww. . • ^r^c^ (mod fc,) (16) decode the tamper-resistant program. 

where b, is the ith number in the vector of pairwise relatively ^J^} ^^^Pj^ ^ decision block determines whether the entire 

prime numbers {b„ b„ . . . bj. As eachof the corresponding ^0 SSA graph has been traversed, and if not, the compiler steps 

r„ r„ . . . r„ are ^sidues, they will all be smaller than b„ incrementally to the next Ime of code by means of step 78. 

therefore equation (16) may be simplified to: ''^f ^^^^l determination is made whether to select a new 

^ \ / ^ r ^^jg jjj^ pairwise relative pnmes by returmng to 

r,»(c, mod b,)xri+(c2 mod b;)xr2+. . . +(0^ mod b,)xr„ (17) step 70, or to continue with the same set by returning to step 

25 72. Alternatively, one could return to step 68 to create a 

Each component (c^ mod b^) will be a constant for a given completely new base set, though this would not generally be 

basis, and can be pre-calculated and stored so that the necessary. 

residue numbers can be decoded, and the software executed, Qnce the decision block at step 76 determines that the 

when required. Because the vector of (c^ mod by) factors are 55 a graph has been traversed, the phase is complete, 
not relatively prime, they will have common factors. 30 

Therefore, the base {b^, b2, . . . b„} can not be solved fi^m Bit Exploded Coding 

knowledge of this set of factors. Tlie re fore storing this set ^ike the residue number coding above, the bit-exploded 

of solution constants with the encoded software does not ^ i.ohniquc encodes one virtual register (VR) or other 

provide the attacker with any mformation about the old or ^^^.^^j^ ^^^^^ variables. 

the new bases. 35 - , - , ■ . . 1 . . 

Ine idea IS to convert one n-bit variable into n Boolean 

Division of Residue Numbers variables. That is, each bit of the original variable is stored 

in a separate and new Boolean variable. Each such new 

Most texts like Kxiuth also indicate that division is impos- Boolean variable is either unchanged or inverted by intcr- 

sible. However, the invention provides a manner of division changing true and false. This means that for a 32-bit 

by a constant. 40 ^^,.^^^^1^^ ^^ere are 2^^, a little over 4 billion, bit-exploded 

In order to perform division by a constant using residue codings to choose from, 

numbers, the divisor must be one of the numbers of the base: -^his encoding is highly suitable for code in which bitwise 

Let: the base be {b^, b2, . . . b„,}, Boolean operations, constant shifts or rotations, fixed bit 

the divisor be b,., which is a member of the set {bj, permutations, field extractions, field insertions, and the like 

bj, . . - b^}, and are performed. Shifts, rotations, and other bit rearrange- 

the quotient be {q-,, q2, - • - , q„}- ments have no semantic equivalent in high-level code, since 

Then, to calculate q^- (where i is not j): they specifically involve determining which bits participate 

in which Boolean operations. 

50 For other Boolean operations, the complement operation. 

The algebraic derivation is straightforward, by symboli- which takes a complemented input (if unary) or two comple- 

cally performing the full decoding and division. The key is mented inputs (if binary) and returns a complemented result, 

the observation that all the other terms vanish due to the is clear by application of de Morgan*s laws, so dealing with 

construction of the c/s. the inversion of some of the variables in the bit-exploded 

To calculate q„ the terms do not vanish, so a computation 55 representation is straightforward. Recall that de Morgan's 

must be made of: first law states that: not ((not x) and (not y))=x or y, and 

second law states that: not ((not x) or (not y))=x and y. In 

qric,/b: mod b,)' r,+. . . +(c^b,. mod b^)' r„ (20) general, if op is a binary operation, it is desirable to use the 

This equation does not take account of the range reduction operation op2 such that: 

needed, so a separate computation is used to calculate the 60 xo^ly~uo\. ((notx) op (noty)) 
number of times the range has been wrapped around, so that 

the proper value may be returned: Examples would be that the complement of the and opera- 
tion is or, and the complement of the or operation is and. The 

>*'H(cA)xri4-. . . +(c»r„KrangeSi2e/i>XrangeSi2e/i'^) (21) same Strategy applies to other operations as well. 

Therefore, the decoded integer value becomes: operations, either the operation or 

its complement on each bit is periormed. For example, if a 

xfe^;+(rangeSize/6,)j£H', (22) 4-bit variable X has been exploded into 4 Boolean variables 



^.={c/ft,. mod fc/)* ry+(c,-l)/&,. mod br n (19) 
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a, b, c, d, with a and d uninverted and b and c inverted, then as many as 12, this is a viable approach, and can result in 

where y has similarly been encoded as a', b', c', d' and z is substantial savings of memory space and/or increased speed 

to be encoded similarly as a", b", c", d", the operation: in execution compared to bit-exploded encoding. 

z-x and y Moreover, bit-tabulated encoding is compatible with the 

may be performed by computing: ^ bit-cxploded encoding, and it is preferable to combine the 

a"c*a and a' techniques where opportunities occur. 

b"«b or b' The Reverse Transformation: Bit-Tabulated to Bit- 

c"=c or c' Exploded 

d"=d and d' lO The bit-tabulalion encoding is an optimization of bit- 
since the or operation is the complement of the and exploded coding. Sometimes it is useful to perform the 
operation, and it is the b and c components of each variable reverse of this transformation. That is, to transform a table - 
which are complemented. lookup with the above-described characteristics into a net- 

This encoding results in a substantial increase in the work of Boolean operations. This is straightforward, and 

number of operations relative to the original program, 15 algorithms for converting from such tables into such nel- 

exccpt for operations which can be "factored out" because works can be found in many books on circuit theory, for 

they can be done by reinterpreting which variables represent example, Switching Theory, by Paul E. Wood, Jr., McGraw- 

which bits or which bits are in the representation are Hill Book Co., 1968, Library of Congress Catalog Card 

inverted. Number 68-11624. 

Some of this expansion may be avoided by using the 20 An example where this reverse transformation is useful is 

optimization routine described hereinbelow. vAshts to disguise the tables. For example, one 

no. 7 presents a flow chart of an exemplary implemen- ^ay convert from the bit-tabular form to the bit-exploded 
tation of bit-exploded encoding. The routine begins at step fo^m, which involves the injection of random bit inversions, 
82, where a variable or set of variables is identified for ^nd then when optimization converts parts of the code back 
boolean encoding. At step 84, a corresponding set of boolean 25 ^^to bit-tabular form, the tables are drastically disguised and 
variables is defined for each original variable. Additional changed. Thereby, this provides an effective means for 
lines of software code are then added at step 86 to redefine data-coding small tables used in table lookup operations, 
the new boolean variables using shifts, rotations, inversions example, one may hide Data Encryption Standard 
and other transforms as described hereinabove These van- Bit-Exploded and Bit-Tabulated coding, 
ables and their transforms are recorded m the phantom 30 dES is currently the most widely known and studied encryp- 
parallel program", so that the outputs of the program can be ^.^^ algorithm. Moreover, triple-DES variants of DES con- 
rationalised when required. Note that variables which are ^-^^^ ^^-^^^^^ ^^^^ encryption even in quite secure 
completely internal to the program, may never be rationa- applications 

lised in th^ manner. . u 4U ♦ • ^ -.^ The DES algorithm is well suited for a combination of the 

At step 88, a decision block determines whether the entire 35 lodc^d and bit tabular encodin s B erformin 

SSA graph has been traversed, and if not, the compiler steps i -exp o e an i - a ui ar enco ings. y per orming 

. .1 . .t- X - ui 1- f J ui 1 f tamper-resistant data -encoding on a routine with an embed- 

incrementally to the next variable, line of code, or block ol , . . i u* u tm-c *• f 

. . ^ r , nn if,u \' CCA u i 1^ r-i dcd Constant key, which performs DES encryption, for 

code,bymeansofstep90.If the entire SSA graph, or at least , * : u 

. <-r. i J t. t_ . J *t. u • example, a tamper-resistant software routine may be pro- 

the target SSA code has been traversed, the phase is cx,m- ,J\.. J, h,„ L J.nh 



plete. 



duced which still performs DES encryption, but for which 
extraction of the key is a very difiBcult task. This extraction 

An Optimization: Bit-Tabulated Coding ^ particularly difficult it a fully-unrolled implementation is 

used, that is, one in which the 16 rounds of DES are 

In the bit -exploded technique described above, the result- separated into individual blocks of code instead of being 

ing code may be excessively bulky and slow to execute. implemented by a loop cycling 16 times. Such unrolling can 

However, an optimization may be performed which reduces easily be performed with a text editor prior to execution of 

these inefficiencies. the tamper- resistant encoding. 

Bit-exploded coding may produce data-flow networks This is clear from consideration of the DES algorithm, 

having subnetworks with the following properties: The entire DES encryption process consists of small shifts, 

they have only a reasonably small number of inputs; and 50 t>it permutations or bit transforms very similar to 

they are acyclic; that is, contain no loops. permutations, and lookups in small tables called S-boxes 

When this occurs, one can replace the entire network or which are already in the ideal form for the bit-tabular to 

subnetwork with a table lookup. This results from the fact bit-exploded form mentioned above, 

that an m-input, n-output Boolean function can be repre- For example, given a subroutine which computes DES, in 

sented by a zero-origin table of Z" n-bit elements. Instead of 55 which the key is embedded in the routine body as a constant, 

including the network in the final encoded program, it is so that it computes DES for only this one key, and in which 

simply replaced with a corresponding table lookup, in which the loop representing the 16 'rounds' of DES has been 

one indexes into the table using the integer index formed by unrolled, either by linrolling it at the source level, or by 

combining the m inputs into a non-negative integer, obtain- applying aggressive loop unrolling to unroll the rounds in 

ing the n-bit result, and converting it back into individual 60 the code optimizer, this routine may be encoded according 

bits. Note that the positions of the bits in the index and the to the method of the invention as follows: 

result of the above lookup can be random, and the network 1. The entire routine is encoded using the bit-exploded 

can be previously encoded using the bit-exploded coding, so encoding, and using the conversion from bit-tabular to 

the encoding chosen for the data is not exposed. bit-exploded on the S-boxes. Note that the small shifts, 

It is desirable that the number of inputs to the table be 65 word splits, and permutations disappear as they are 

small, to keep the table from becoming excessively large. simply re-interprctations of the identities of the Bool- 

However, for anything up to eight inputs, and sometimes for cans. This is only true with the unrolled version where 
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the shift for each round is a constant. At this point, the 
code contains excessive bulk, but may be reduced. 

2. The code produced above is now reduced using con- 
ventional constant folding. The effect is that the key has 

. now completely disappeared, but the code bulk remains 
excessively large. 

3. Further encoding is now performed by recoding using 
the bit-exploded to bit-tabular optimization. 

A completely different set of S -boxes has now been 
produced which bears no discoverable relation to the origi- 
nal ones and correspond only to the encoded data. The 
positions of the bits, and to some extent even which part of 
the computation has been assigned to which S-box, is now 
radically changed. 

The same process can be used to create a routine which 
performs the corresponding decryption. 

The above method for hiding DES keys may not be 
particularly useful on its own, since an attacker with access 
to the encryption and decryption routines could simply use 
the routines themselves, instead of the keys, to achieve what 
could otherwise have achieved by knowing the keys. 
However, if DES or triple-DES is embedded in a larger 
program, use of the control-flow encoding in concert with 
data-flow encoding in a manner of the invention, makes the 
above technique highly useful, since it is then no longer 
possible to extract the encryption and decryption routines in 
isolation. 

There arc many uses for software applications which 
embed and employ a secret encryption key without making 
either the key or a substitute for the key available to an 
attacker. The method of the invention can generally be 
applied to these applications. 

Custom Base Coding 

As noted above, custom based coding provides the opti- 
mal tamper-resistance in view of the three targeted proper- 
ties: anti-hologram, fake -robustness and togetherness. 
However, this performance is at the expense of memory and 
necessary processing power. Therefore, it may be desirable 
to only use this technique in certain portions of the target 
program, and to use techniques which are less demanding of 
system resources in other areas of the target. 

In broad terms, this coding technique is a variable trans- 
form in a custom coordinate space. For example, values 
defined on an (x, y) coordinate space could be transformed 
onto a (x-y, x+y) coordinate space. Such a transformation 
would give the visual impression of a 45** rotation. Of 
course, this coding transformation may also be 
n-dimensional, so the visual analogy to 2 dimensions is a 
limited analogy. Note that the vectors need not be 
orthogonal, but they must be independent in order to span 
the vector space. Tlial is, if there are n vectors, they must 
form the basis for a n-dimensional vector space. 

For a simple example, variable "x" is grouped with some 
other variables such as "y" and *V, that may be part of the 
program or decoy variables that have been created. Then an 
invertible map to some other set of variables is created. This 
technique basically treats x, y, z as basis vectors in some 
coordinate space, and the mapping is just the change to a 
different basis. 

In the same manner as the polynomial and bit-transform 
techniques, the details of the custom base transformation are 
not required to execute the program, so they may be dis- 
carded once it is complete. Therefore, there are no secrets 
left in the executable tamper-resistant program that an 
attacker may use to decode it. 
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If this transform was executed on a single equation, it 
would be possible to identify what has been done, and to 
reverse the transformation. However, with multiple 
equations, the inverse transformation would be very difiScult 
to calculate. As well, there are additional degrees of freedom 
which increase the complexity, and reduce the tracibility by 
onders of magnitude. For example: 

1. Variables need not be grouped with other variables that 
have either related function or location. In fact, it is 
desirable to use disparate variables as an attacker would 
be less likely to look towards diverse and unrelated 
areas of the program for interdependency. 

2. Decoy variables may be added to the SSA graph and 
included in the transform. A particular area of a soft- 
ware program, for example, the copy protection area, 
may be made the focus of this coding technique. Since 
the code in the copy protection area of the software 
program is rarely executed, this technique could be 
used to add 10,000 or so operations to this area. The 
user would not generally be inconvenienced by the 
additional fraction of a second it would take to execute 
the tamper-resistant copy protection code, or the 40 
kilobytes or so of memory it would require. An 
attacker, however, would not be able to decode these 
operations using traditional reverse-engineering tech- 
niques and would have to analyse them by hand. 
Therefore, this method would have tremendous utUity 
for tamper-resistance. 

3. It is also straightforward to scale this coding technique 
to handle n-dimensions. This creates a large matrix of 
interdependent equations, spread throughout diverse 
areas of the tamper-resistant software program. This 
way, a small change in one area of the program may 
have consequences in many other areas. An attacker 
would not be able to identify which areas would be 
affected by a given change. 

4. The bases for the custom-base codings may be changed 
almost continuously as the tamper-resistant software is 
compiling, provided that the tamper-resistant compiler 
remembers the encodings, so that the operation and 
outputs of the tamper-resistant software remain coor- 
dinated. 

FIG. 8 represents a simple application of this technology 
in a preferred embodiment of the invention. The routine 
begins at step 92, where a variable or set of variables is 
identified for custom base encoding. At step 94, decoy 
variables are added if necessary, bringing the number of 
variables to n. At step 96, additional fines of software code 
are then added to map these n variables onto a new 
n-dimensional space. These variables and their transforms 
are recorded in the "phantom parallel program", so that the 
outputs of the program can be rationalised when required. 
Note that variables which are completely internal to the 
program, may never be rationalised in this manner. 

At step 98, a decision block determines whether the entire 
SSA graph has been traversed, and if not, the compiler 
continues to analyse the SSA graph, by means of step 100. 
When the entire SSA graph, or the at least the target SSA 
code has been traversed, the phase is complete. 

Choosing Random Numbers 

For all the coding schemes, a large number of random 
numbers are required. For repeatabiUty to aid debugging, 
Pseudo-Random numbers may be advantageously used. 
Given that a large number of random numbers are required 
and are used in many ways, truly random numbers such as 
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those produc3ed from radioactive decay, are not necessary, 
but would offer increased tamper-resistance. Presently, com- 
puter peripheral devices for the generation of truly random 
bits using random electronic fluctuations are commercially 
available. 

The more interesting question is how to pick the coeffi- 
cients and bases for the various codings. The particulars of 
those selection strategies are outlined in the discussion of the 
techniques themselves. 

Preferred Implementation 

It is not sufiBcient merely to pick random codings, but the 
codings must be selected and coordinated so that each 
producer and consumer agree on the interpretation/coding at 
every point. As described above, there arc instances where 
the program is such that a given selection will not nicely line 
up everything and a new coding must be selected using a 
Recode operation. 

There are many different ways to implement the 
invention, keeping in mind that the goal is to minimize the 
times that data appear "in the plain" and to avoid outputting 
the magic numbers into the scrambled program. One very 
simple way is to divide the work into several phases, first 
assigning codings, then actually perform the changes. An 
example of such as implementation is presented in the flow 
chart of FIGS. 9a and 9i>, which presents the following 
steps: 

1. Compile the original program into static single assign- 
ment form at step 102 of FIG. 9a. As noted above, it is 
prefcred to execute this steps using a standard compiler 
front end suitable to the application. 

2. Optionally, optimize the intermediate code at step 104. 

3. WaUc the SSA graph to gather constraints at step 106. 
Examples of such constraints would include: 

a. identifying "merge^' nodes. In static single assign- 
ment a merge node does nothing, but requires that all 
its input/output have the same coding; 

b. if a divide by a constant is chosen to be coded using 
the residue number technique, then the divisor must 
be part of the base; and 

c. identifying input and output variables. 

4. For each of the input and output variables, assign any 
pre-defined coding at step 108. Also, variables whose 
values are inherently exposed may also be Null coded. 
For example, if the outcome of a comparison will either 
be True of False, it is difficult to hide the behaviour of 
any boolean branch which employs it, so there is no 
advantage in tamper protecting it. There may however, 
be instances where there is an advantage to encoding 
such a comparison, for example, if the control flow is 
to be encoded in some manner. 

5. As noted above, it is preferred to perform the tamper- 
resistant techniques in many phases to reduce possibil- 
ity of error and improve the ease of trouble -shooting. 
Steps 110 through 116 are performed until each desired 
phase has been completed, which is determined at the 
decision point 110. As noted above, the coordination of 
the phases is administered by a "phase control file". 

6. Walk the S.S.A. graph at step 112 to propagate a 
proposed set of virtual register codings into a phantom 
parallel program. If a virtual register has a coding, then 
examine its producer operation and consumer opera- 
tions to propagate the encoding and generate new 
encodings where required. When the first virtual reg- 
ister reaches an operation, assign coding for that 
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Operation, which will usually assign codings to all its 
input/output virtual registers. 

7. The decision block at step 114 identifies inconsistencies 
or unallowable conditions in a proposed encoding 
which would cause it to be disallowed. In such a 
circumstance, control passes to step 116 to propose and 
analyse a new coding. If a coding is allowed, control 
passes back to step 110 for the next phase to be 
performed. 

8. For operations that are left, some random, but 
allowable, coding may be chosen and propagated to its 
input and output virtual registers at step 118 of FIG. 9b. 

9. For each virtual register, which now all have a coding 
stored in the phantom parallel program, generate a new 
set of virtual registers to contain the coded values at 
step 120. For codings like CXistom Base, several origi- 
nal virtual registers will map into the same set of new 
virtual registers. 

10. For each operation, gather the associated input and 
output virtual registers and the corresponding new 
coded virtual registers at step 122. Expand the opera- 
tion into whatever is required. In the preferred 
embodiment, this is done using a dedicated language to 
help perform the mapping between original and coded 
virtual registers, but that is merely a matter of pro- 
gramming convenience . 

11. The tamper-resistant intermediate code is then com- 
piled into tamper resistant object code using a standard 
compiler back end 32. As a refinement, prior to the 
conversion to a specific executable object code in the 
back end 32, one may take individual instructions and 
move each to one or more new locations, where per- 
mitted by their data flow and control flow dependen- 
cies. This increases the extent to which the encoded 
software exhibits the togetherness and anti-hologram 
properties. 

The preferred routine is then complete. 

While particular embodiments of the present invention 
have been shown and described, it is clear that changes and 
modifications may be made to such embodiments without 
departing from the true scope and spirit of the invention. For 
example, rather than using the encoding techniques 
described, alternate techniques could be developed which 
dissociate the observable execution of a program from the 
code causing the activity. 

It is understood that as de-compiling and debugging tools 
become more and more powerful, the degree to which the 
techniques of the invention must be applied to ensure tamper 
protection, will also rise. As well, the concern for system 
resources may also be reduced over time as the cost and 
speed of computer execution and memory storage capacity 
continue to improve. 

These improvements will also increase the attacker's 
ability to overcome the simpler tamper-resistance techniques 
included in the scope of the claims. It is understood, 
therefore, that the utility of some of the simpler encoding 
techniques that fall within the scope of the claims, may 
correspondingly decrease over time. That is, just as in the 
world of cryptography, increasing key- lengths become nec- 
essary over time in order to provide a given level of 
protection, so in the world of the instant invention, increas- 
ing complexity of encoding will become necessary to 
achieve a given level of protection. 

As noted above, it is also understood that computer 
control and software is becoming more and more common. 
It is understood that software encoded in the manner of the 
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invention is not limited to the applications described, but 4. A method as claimed in claim 3, wherein said step of 

may be applied to any manner of the software stored, or dispersing comprises: 

executing. redefining an argument using one of the following tech- 
The method steps of the invention may be embodiment 

niques: 

sets of executable machine code stored in a variety of ^ residue coding; 

formats such as object code or source code. Such code is bit-explosion; 

described gencrically herein as programming code, or a bit-residue; or 

computer program for simplification. Clearly, the executable custom base coding. 

machine code may be integrated with the code of other 5. a method as claimed in claim 1 wherein said steps of 

programs, implemented as subroutines, by external program encoding and transforming comprise the step of: 

calls or by other techniques as known in the art. encoding said computer software code such that minor 

The embodiments of the invention may be executed by a ^^^^^^^ ^-jj ^^^^j^ nonsensical operation when the 

computer processor or similar device programmed in the encoded software is executed, without causing the 

manner of method steps, or may be executed by an elec- 35 ^^^^^^^ ^^^^^^^ immediately fail, 

tronic system which is provided with means for executing g ^ ^^^^^^ ^j^-^^^ ^^^-^ 5 j^^jher comprising the 

these steps. Similarly, an electronic memory means such ^^^p 

computer diskettes, CD-Roms, Random Access Memory , , , ^ j * n 

/t^aC^ r. J ^ 1 rnr^KA\ • ^^ / adding code to said computer software code to allow 

(RAM), Read Only Memory (ROM) or similar computer -u. * u a f 1 -.u . 

^ . ^' ^ t. , . ,u ^ u variables to have a broader range of values without 

software storage media known in the art, may be pro- 20 causin out of ran e errors 

grammed to execute such method steps. As well, electronic a^^^^?u o range errors. . 

^. , J . I u * 7. A method as claimed in claim 6, wherein said steps of 

signals representing these method steps may also be trans- j jj- ^ ■ *u * e 

- ^ . , , encoding and adding code comprise the step of: 

mitted via a communication network. . ^^^fn- 

It would also be clear to one skilled in the art that this redefimng an argument using one of the following tech- 
invention need not be limited to the existing scope of 25 niques: polynomial coding; residue coding; bit- 
computers and computer systems. „ ?P^°'u°"i^ bit-residue; or custom base coding. 

Gedit, debit, bank and smart cards could be encoded to ^ ^^^^od as claimed in claim 1 wherein said steps of 

apply the invention to their respective applications. An encoding and transforming compose the steps of: 

electronic commerce system in a manner of the invention defining a first variable in said computer software code in 

could for example, be applied to parking meters, vending 30 teniis of a second variable in said computer software 

machines, pay telephones, inventory control or rental cars code, so that modification of said second variable 

and using magnetic strips or electronic circuits to store the modifies the value of said first variable, 

software and passwords. Again, such implementations 9. A method as claimed in claim 1 wherein said steps of 

would be clear to one skilled in the art, and do not take away encoding and transforming comprise the steps of: 
from the invention. ^5 defining a plurality of variables in terms of one another, 

What is claimed is: so that modification of any one of said variables will 

1. A method of increasing the tampcr-resistance and alter the definition of all of said plurality of variables, 
obscurity of computer software code comprising the steps 10. A method as claimed in claim 9, wherein said step of 
of: defining comprises: 

encoding said computer software code into a domain redefining an argument using one of the following tech- 

which does not have a corresponding semantic niques: 

structure, to increase the tamper-resistance and obscu- polynomial coding; 

rity of said computer software code by: residue coding; 

transforming the domains of individual operations in bit-explosion; 

said software code, and of the data used by and bit- residue; or 

computed by said individual operations in said soft- custom base coding. 

ware code, so that each individual operation, 11, A method as claimed in claun 1 wherein said steps of 

together with the data which it uses and the data encoding and transforming comprise the steps of: 

which it computes, occupies a different data domain responding to a fine of code defining a polynomial equa- 

from the data domains of such operations and data in tion by: 

the original software code, and so that the original of redefining each variable in said polynomial equation by 

said operations and the original of said data may not a new polynomial equation; and 

be readily deducible from the transformed versions selecting random values of constants in said new poly- 

of said operations and the transformed versions of nomial equations. 

said data. 12. A method as claimed in claim 11 wherein said step of 

2. A method as claimed in claim 1 wherein said steps of selecting comprises selecting values of constants in said new 
encoding and transforming comprise the step of: polynomial equations to invert the sense of an arithmetic 

dispersing the definition of an argument into a plurality of operation in said polynomial equation. 

locations, to dissociate the observable operation of said go ^ ^^^^^^ ^ claimed in claim 12 wherein: 

computer software code from said computer software said step of redefining comprises redefining each variable 

code while being executed. in said polynomial equation by a new first order poly- 

3. A method as claimed in claim 2, further comprising the nomial equation; and 

subsequent step of: , said step of selecting comprises selecting values of con- 
moving selected individual instructions to new locations 65 stants in said new first order polynomial equations to 
permitted by their data flow and control flow depen- invert the sense of an arithmetic operation in said first 
dencies. order polynomial equation. 
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14. A method as claimed in claim 1 wherein said steps of 
encoding and Iransforming comprise the steps of: 

generating and storing a set of relatively prime factors; 

transposing said computer software program by calculat- 
ing residues based on said set of relatively prime 
factors; 

calculating a corresponding set of execution constants 
which may be used to execute said encoded computer 
software program; and 

storing said set of execution constants with said encoded 
computer software program. 

15. A method as claimed in claim 14 wherein said step of 
transposing comprises selecting a block of SSA code and 
transposing said block of SSA code into a corresponding set 
of residual code by calculating residues based on said set of 
relatively prime factors. 

16. A method as claimed in claim 1, wherein said steps of 
encoding and transforming comprise the step of: defining an 
n-bit variable as a corresponding set of n-boolean variables. 

17. A method as claimed in claim 16, further comprising 
the step of: adding lines of code to invert selected ones of 
said corresponding set of n-boolean variables. 

18. A method as claimed in claim 17, further comprising 
the step of: 

responding to the data flow of said computer software 
code having a reasonably small number of inputs and 
being acyclic, by replacing said corresponding set of 
n-boolean variables with a table lookup. 

19. A method as claimed in claim 1 wherein said steps of 
encoding and transforming comprise the step of: mapping a 
set of n-variables into a new n-dimensional, custom coor- 
dinate space. 

20. A method as claimed in claim 19 wherein said step of 
mapping comprises: 

mapping a set of n-independent variables into a new 
n-dimensional coordinate space defining a rotation of 
said set of n-independent variables from the original 
coordinate space. 

21. A method as claimed in claim 1, wherein said steps of 
encoding and transforming comprise the step of: 

encoding intermediate computer software code into 
tamper-resistant intermediate computer software code 
having a domain which does not have a corresponding 
semantic structure, to increase the tamper- resistance 
and obscurity of said computer software code; and 
further comprising: 

a prior step of compiling said computer software pro- 
gram from source code into a corresponding set of 
intermediate computer software code; and 

a subsequent step of compiling said tamper-resistant 
intermediate computer software code into said 
tamper-resistant computer software object code. 
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22. A method as claimed in claim 1 wherein said steps of 
encoding and transforming comprise the step of: 

encoding arguments in said computer software code into 
a domain which does not have a corresponding hi^ 
^ level semantic structure, to increase the tamper- 
resistance and obscurity of said computer software 
code, 

23. A computer readable memory medium, storing com- 
puter software code executable to perform the steps of: 

compiling said computer software program from source 
code into a corresponding set of intermediate computer 
software code; 
encoding said intermediate computer software code into 
15 tamper-resistant intermediate computer software code 
having a domain which does not have a corresponding 
semantic structure, to increase the tamper- resistance 
and obscurity of said computer software code; and 
compiling said tamper-resistant intermediate computer 
20 software code into tamper-resistant computer software 
object code. 

24. A computer data signal embodied in a carrier wave, 
said computer data signal comprising a set of machine 
executable code being executable by a computer to perform 

25 the steps of: 

compiling said computer software program from source 
■ code into a corresponding set of intermediate computer 
software code; 
encoding said intermediate computer software code into 
tamper-resistant intermediate computer software code 
having a domain which does not have a corresponding 
semantic structure, to increase the tamper- resistance 
and obscurity of said computer software code; and 
35 compiling said tamper-resistant intermediate computer 
software code into tamper-resistant computer software 
object code. 

25. An apparatus for increasing the tamper-resistance and 
obscurity of computer software code, comprising: 

40 front end compiler means for compiling said computer 
software program from source code into a correspond- 
ing set of intennediate computer software code; 

encoding means for encoding said intermediate computer 
software code into tamper-resistant intermediate com- 
puter software code having a domain which docs not 
have a corresponding semantic structure, to increase 
the tamper- resistance and obscurity of said computer 
software code; and 

back end compiler means for compiling said tamper- 
resistant intermediate computer software code into 
tamper-resistant computer software object code. 
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